Skip to content

feat(expert): human-in-the-loop tool permissions (#421)#7639

Open
andypalmi wants to merge 53 commits into
mainfrom
feat/421-expert-tool-permissions
Open

feat(expert): human-in-the-loop tool permissions (#421)#7639
andypalmi wants to merge 53 commits into
mainfrom
feat/421-expert-tool-permissions

Conversation

@andypalmi

@andypalmi andypalmi commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

Human-in-the-loop tool permissions for the Expert

Implements per-tool human-in-the-loop permissions for the Expert's flow-building tools, in the immersive editor, as described in FlowFuse/product#421. The builder (and their team role) controls which flow-building actions the Expert may run, which need approval, and which are off limits, so it never makes a change they would not have allowed.

Stacked on #7635 (feat/408-expert-plan-mode), which is the base of this PR and should merge first.

What it does

  • Inline approval card in chat when a tool's policy is Ask: friendly tool name, action type (Read / Write / Delete) and the concrete call parameters as prettified JSON, with Allow / Always allow / Deny / Always deny. "Always allow" and "Always deny" apply for the rest of the current chat and reset on Start Over and on refresh; after a choice the card shows exactly what was picked and collapses the payload. The agent pauses on the round-trip with no session timeout, however long the user takes; the chat stop button cancels it (treated as denied).
  • Per-team settings (in the Expert settings dialog): permissions are saved per team. Each tool group (flow-building, platform) has its own always-visible default permissions — a per-action-type default (Always allow / Ask / Always deny) for read, write and delete — and collapses its individual-tool overrides behind an accordion. The policy control is a fast three-button toggle rather than a dropdown.
  • Per-tool overrides that stay put: a permission set on an individual tool overrides its type default and keeps that setting until reset. Each type default shows an "N set individually" count of the tools in that scope carrying their own saved permission, and a Reset action beside it returns those tools to the default. Session-only grants are excluded from the count — they appear per-tool and reset on their own.
  • Make permanent: a tool granted only for the current chat shows a "Make permanent" action to save that choice for the team.
  • Section ordering by context: flow-building tools lead in the immersive editor; platform tools lead in the app.
  • Role inheritance, fail-closed: read-only team members cannot enable or trigger write/delete tools and see why; the agent also fails closed server-side.
  • Version gating: each tool carries a min/max nr-assistant version. Tools render as available, "update required", or deprecated against the instance's nr-assistant version. Versioned variants (e.g. Manage Groups v1/v2) collapse into one row and resolve to the in-range variant, nudging an update to the newest variant's min version when behind.

Architecture

  • The agent decides policy at the toolsNode seam (sibling of the plan-mode gate): role check first, then per-tool policy. allow runs, deny feeds the denial back to the model so it adapts and explains, ask publishes expert:tool-approval and awaits the browser's decision.
  • The flow-building tool catalog is served over the agent's GET /mcp/flow-tools endpoint (friendly name + scope + version window only). Forge exposes GET /api/v1/expert/mcp/tools, which proxies that endpoint and returns the merged catalog: FlowFuse platform tools are curated into the same array (tagged as a platform group) — wired and commented out until the platform-tool work is merged, at which point it's a one-line switch. Every chat response carries a hash of the flow-building catalog; the browser refetches only when the hash diverges, so it stays correct across rolling deploys where instances can be on different versions.
  • Saved per-team choices and per-chat session grants live in the existing product-assistant / product-expert Pinia stores.

UI

The settings panel follows existing FlowFuse patterns: FormHeading for section titles, ff-data-table for the defaults and tool lists, ff-accordion to collapse the per-tool detail, and the shared three-button toggle for each policy control, so each tool name lines up with its own control across the row border.

Out of scope (follow-ups)

  • Admin-configurable team-wide default policy (DB migration + admin UI + server enforcement).
  • Enabling platform (non-flow-building) tools in the catalog, once the platform-tool work is merged into the agent. The forge endpoint has the curation wired and commented, ready to switch on.

Testing

  • Build + color/eslint lint green.
  • Automated unit tests:
    • forge/ee/routes/expert/index_spec.js (new MCP tools Endpoint block): GET /mcp/tools auth (401 for instance/device tokens), team-access (404 for non-members), missing teamId (400 from the querystring schema), the flow-tools catalog + hash proxy (asserting the upstream /mcp/flow-tools URL and service token), the empty-response defaults (catalog: [], hash: null), and upstream error-status propagation.
    • frontend/src/stores/product-assistant.spec.js (new tool-permissions block): the permission-resolution engine, i.e. the classOf/groupOf helpers, per-team class defaults, saved vs. session policy resolution, resolvedToolPermissions, version gating (toolAvailabilityFor), catalog/preference/override mutations, resetGroupClassPreferences, promoteSessionOverride, and the pending-approval registry.
    • frontend/src/stores/product-expert-tool-permissions.spec.js (new): catalog fetch (success / no-team / error) and the approval round-trip (session short-circuit, resolve, always-allow, always-deny, cancel).
  • Manual: catalog populates on opening the immersive editor; Ask shows the card and pauses with no timeout (Allow applies to canvas, Deny explains gracefully); "Always allow"/"Always deny" apply for the chat and reset on Start Over / refresh; "Make permanent" saves the choice per team; a per-tool override holds against changing the type default, the "N set individually" count reflects it, and Reset returns those tools to the default; per-team settings stay separate across team switches and survive reload; read-only role sees write/delete disabled with a reason and cannot trigger them; leaving immersive hides the panel and stops sending permissions; chat stop while a card is open recovers cleanly.

Requires matching agent-side changes.

Refs FlowFuse/product#421

Screenshots

Expert Settings with Tools permissions

Screenshot 2026-07-01 at 17 24 10

Counter of how many tools have different permissions than their scope's permission

Screenshot 2026-07-01 at 17 24 32

Permissions reset and change behaviour per scope

Screen.Recording.2026-07-01.at.17.24.46.mov

Option to save a permission set for the current session

Screenshot 2026-07-01 at 17 32 07
Screen.Recording.2026-07-01.at.17.32.09.mov

Approval Cards

Screen.Recording.2026-07-01.at.17.30.45.mov
Screenshot 2026-07-01 at 17 31 43

cstns and others added 14 commits June 22, 2026 19:28
…crash

Add an ErrorBoundary and wrap each answer item in it, so a failure in one
section degrades only that section instead of blanking the whole message.
Also guard the optional streamable chain in StandardResourceCard that could
throw on a null value and take down the message.
)

The Expert can ask 1-4 clarifying questions in a single turn, each rendered as
its own single- or multi-select option card; all answers are collected before
the turn is submitted. Answered cards can be edited and resubmitted, and a card
from a past turn is disabled once a newer message arrives.

Adds a follow-up-questions cadence setting (all at once vs one at a time) in the
composer settings menu, shipped to the agent via the expert context.
Add an always-visible Plan mode toggle to the composer. When enabled, the Expert
proposes a plan instead of making changes, rendered as a plan card with Approve,
Edit, Request changes and Reject actions:

- Approve exits plan mode and proceeds with the plan.
- Edit loads the plan markdown into the composer for direct editing.
- Request changes focuses an empty composer to describe a change in words.
- Reject abandons the plan.

The plan card renders its markdown through RichContent (passing the message and
answer uuids it requires), and reuses the composer's pending-input and auto-grow
behaviour. Plan mode and the approval signal are shipped to the agent via the
expert context.
Plan mode is only meaningful inside the instance/device editor for now,
so gate the composer toggle on immersive mode and force the persisted
planMode off whenever the user is outside immersive (including on load),
preventing a stale value from being sent in non-immersive contexts.
- Guard the optional streamable chain in FlowResourceCard directly instead
  of relying on a render boundary to mask the throw. Reduce ErrorBoundary to
  a single last-resort backstop per answer item in AiMessage; drop the
  per-section boundary wrappers in AnswerWrapper.
- Rewrite QuestionsList on top of the existing ff-radio-group (single-select)
  and ff-checkbox (multi-select) components so options look like standard,
  clickable form controls and stay consistent with the rest of the app.
- Replace the imperative growComposerToContent DOM measuring with CSS
  field-sizing on the textarea; drop the manual reflows and the auto-grown
  flag. The composer auto-sizes to content and pins to an explicit height
  only after a drag-resize.
Per review: the local-catch pattern was the only one in the frontend; the
rest of the app leans on the global app.config.errorHandler. The real throws
are now guarded at their source (the optional streamable chains in the
resource cards), so the boundary was redundant. Remove it entirely and let
genuinely unexpected render errors surface through the global handler like
everywhere else.
Replace the composer kebab menu with a settings gear that opens an
ff-dialog. The follow-up-questions cadence control now lives in the
dialog as an ff-radio-group, with a FormHeading per section so the
panel can grow as more settings are added.
…n-mode

# Conflicts:
#	frontend/src/components/expert/components/ExpertChatInput.vue
#	frontend/src/components/expert/components/messages/components/AnswerWrapper.vue
Add per-tool approval for the Expert's flow-building tools in the immersive
editor. The agent gates each tool call at the toolsNode seam by class
(read/write/delete) and per-tool preference; write/delete default to Ask and
surface an inline approval card (Allow / Always allow / Never) that holds the
call open with no session timeout, while read defaults to allow.

- Catalog delivered over HTTP (GET /api/v1/expert/mcp/tools), curated to
  friendly names so raw tool identifiers never reach the browser; a per-response
  hash triggers a background refetch when the catalog drifts.
- HITL state consolidated into the product-assistant store (defaults,
  per-tool preferences, pending-approval map) with SemVer version gating.
- Settings panel groups versioned tool variants into one family and points
  update hints at the newest variant's required version.
- Role inheritance is fail-closed: read-only members cannot enable or trigger
  write/delete tools and are shown why.
Use FormHeading for the section titles and ff-data-table for both the
action-type defaults and the flow-building tool list, replacing the
bespoke section/group styling and the non-standard uppercase scope
headers. Bordered table rows pair each tool with its permission control
across the row rather than leaving them to float across whitespace; tool
scope moves into a Type column.

The approval card no longer sends or renders a tool summary; the tool
name, scope and call parameters describe the action.
@andypalmi andypalmi force-pushed the feat/421-expert-tool-permissions branch from 8a97b8e to bdb36fb Compare June 30, 2026 13:46
@codecov

codecov Bot commented Jun 30, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 89.28571% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 75.37%. Comparing base (676f620) to head (bafceb9).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
forge/ee/routes/expert/index.js 92.59% 2 Missing ⚠️
forge/comms/platformAutomation.js 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7639      +/-   ##
==========================================
+ Coverage   75.35%   75.37%   +0.02%     
==========================================
  Files         425      425              
  Lines       22487    22518      +31     
  Branches     5930     5945      +15     
==========================================
+ Hits        16944    16973      +29     
- Misses       5543     5545       +2     
Flag Coverage Δ
backend 75.37% <89.28%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…nd platform tools

Fetch the tool catalog when the Expert panel mounts (not only in the
editor) so the permissions settings render wherever the Expert is.

Split the settings into a Flow Building Tools section, with its own
per-action-type default permissions, and a separate FlowFuse Platform
Tools section (a placeholder until those tools ship, with TODOs marking
where they get mapped in). Flow-building tools are listed everywhere but
noted as usable only from an instance editor.
- Show plain Read / Write / Delete scope instead of phrases like Read only
- Stop the Setup Guide badge rendering above the approval card
- Disable the action buttons as soon as a choice is made
andypalmi added 6 commits June 30, 2026 16:34
handleMessageResponse lived in the composer's send handler, so a reply to a
query sent from the questions card was fetched but never rendered over HTTP
(without comms-beta the reply only renders via the MQTT push handler). Fold
the response handling into handleQuery so every entry point (composer and the
question/plan cards) renders the reply without re-implementing it.
…n-mode

# Conflicts:
#	frontend/src/components/expert/components/ExpertChatInput.vue
#	frontend/src/stores/product-expert.js
…rmissions

# Conflicts:
#	frontend/src/components/expert/components/ExpertChatInput.vue
Raise the conversation-history expiry from 28 to 30 minutes (warning at
27), so the human-in-the-loop tool-approval wait, which is bounded by the
session lifetime, has the full 30-minute window the agent now allows.
Comment thread frontend/src/stores/product-expert.js Outdated
for (const m of this._agentStore.messages) {
if (!Array.isArray(m.answer)) continue
for (const a of m.answer) {
if (a.kind === 'tool-approval' && a.status === 'pending') a.status = 'denied'

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't test this without the agent side, but reading the code: on Stop, cancelPendingToolApprovals sets status='denied' on the store answer, but the card renders a shallow copy of it useStreamingList({ shallow: true }), so its status prop never update. Worth confirming, but looks like Stop won't resolve an open approval card.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, confirmed. The card renders a detached streaming copy of the answer (AiMessage uses useStreamingList with shallow: true), so writing the status onto the store message never reached it. On Stop the buttons stayed live.

Fixed by recording the outcome in a reactive per-id map (toolApprovalStatuses) on the product-assistant store. AnswerWrapper now feeds the card its status from that map, so an external resolution (Stop / Start Over) updates a card the user never pressed. localStatus stays for instant feedback on the user's own press. Added store and product-expert tests covering the denied-on-cancel path.


// A new chat drops the per-session tool grants ("Always allow/deny for this chat").
useProductAssistantStore().clearSessionToolOverrides()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want a cancelPendingToolApprovals() here too?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. startOver now calls cancelPendingToolApprovals() first, so any approval still awaiting a decision resolves (as denied) and the agent's paused tool call unblocks instead of hanging on a message we are about to drop. It also clears toolApprovalStatuses alongside the session overrides.

@n-lark n-lark left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey so I cannot test the approve/deny part of this in that chat due to the staging env not having posthog synced up. The permissions page under settings UI looks fine to me but I don't feel comfortable approving this since I cannot test and am unfamiliar with this feature. I'd recommend @cstns or @Steve-Mcl to takes a look.

…421)

Add automated coverage for the human-in-the-loop tool-permission work:

- forge GET /mcp/tools: auth (401 instance/device), team-access (404),
  missing teamId (400), catalog+hash proxy, empty-response defaults and
  upstream error propagation.
- product-assistant store: permission-resolution engine (class/group
  helpers, per-team defaults, per-tool and session overrides, resolved
  permissions, version gating, and the pending-approval registry).
- product-expert store: catalog fetch and the approval round-trip
  (session short-circuit, resolve/always-allow/always-deny, cancel).
The approval card renders a detached streaming copy of its answer, so
writing a resolved status onto the store message never reached it. On
chat stop the card stayed on its Allow/Deny buttons even though the
pending call had been denied.

Record approval outcomes in a reactive per-id map on the product-assistant
store and have AnswerWrapper feed the card its status from that map, so an
external resolution (chat stop / Start Over) updates a card the user never
pressed. Start Over now also cancels open approvals and clears the map.
@andypalmi

Copy link
Copy Markdown
Contributor Author

Thanks for the review. Pushed a fix for the Stop issue you spotted.

Root cause: the approval card renders a detached streaming copy of its answer (AiMessage uses useStreamingList with shallow: true), so a status written onto the store message never reached the card. Clicking Allow/Deny worked only because the card tracks its own localStatus; the external Stop path had no way in, so the buttons stayed live.

Fix: approval outcomes are now recorded in a reactive per-id map (toolApprovalStatuses) on the product-assistant store, and AnswerWrapper feeds the card its status from that map. That covers external resolutions, Stop and Start Over, on a card the user never pressed. Start Over also cancels open approvals first so the paused tool call unblocks. This is in-memory session state only, not persisted, same lifecycle as the session overrides. Added store and product-expert tests for the denied-on-cancel path.

On testing: understood you cannot exercise approve/deny on staging without the agent side synced. @cstns or @Steve-Mcl, a second look would be welcome given the reactivity change.

cstns and others added 4 commits July 1, 2026 22:53
#7598)

Co-authored-by: Steve-Mcl <sdmclaughlin@gmail.com>
Co-authored-by: Stephen McLaughlin <44235289+Steve-Mcl@users.noreply.github.com>
Co-authored-by: Andrea Palmieri <76187074+andypalmi@users.noreply.github.com>
Co-authored-by: andypalmi <andrea@flowfuse.com>
…omationsHandler integration

# Conflicts:
#	frontend/src/stores/context.js
…421)

Curate the FlowFuse platform automation tools from the handler singleton
(app.comms.platformAutomation) into the /mcp/tools catalog alongside the
flow-building tools, tagged group:'platform' so the UI routes them to their
own section with their own read/write/delete defaults. Read/write/delete
class is derived from each tool's MCP annotations; platform tools carry no
nr-assistant version window.
Replace the mid-turn approval round-trip with a stateless defer/resume flow so
the agent never stays resident waiting on a human. When a turn needs approval it
returns the approval card(s) and ends; the browser collects the decisions and
sends them back in one resume message that continues the turn.

- product-expert: track the open approval batch, resume once every card is
  answered, transport-agnostic (MQTT push or awaited HTTP reply).
- product-assistant: drop the promise-based pending-approval registry; the store
  now only records per-card outcome statuses and session grants.
- tests updated for the batch model.
…permissions

# Conflicts:
#	frontend/src/stores/context.js
#	test/unit/forge/routes/auth/permissions_spec.js
Temporary console traces to diagnose why a support-agent request takes
the HTTP path instead of MQTT. Logs the resolved feature checks that
drive shouldUseMqtt (platform external-broker flag, team-type broker
flag, and the combined value) plus the chosen transport per send.
Base automatically changed from feat/408-expert-plan-mode to main July 2, 2026 08:35
…permissions

# Conflicts:
#	frontend/src/components/expert/components/ExpertChatInput.vue
#	frontend/src/components/expert/components/messages/components/AnswerWrapper.vue
#	frontend/src/stores/context.js
#	frontend/src/stores/product-expert.js
andypalmi added 2 commits July 2, 2026 11:21
…a session grant

The individual-tools table sized the tool-name column to its content, so
promoting a per-chat session grant to a saved default removed the inline
session note and its action, shrinking that column and sliding the Type
(scope) column left. Let the tool-name column absorb the free space so the
Type and Permission columns stay pinned right regardless of the note.
andypalmi added 2 commits July 2, 2026 12:28
Platform and UI automation tools now carry a top-level title used as their
human-friendly label. The title is forwarded on the wire definitions and MCP
registration, and the platform catalog entry prefers it over the name-derived
label.
…lity

Send a supportsHITL flag in the Expert context. Instances predating the
human-in-the-loop tool permissions (#421) omit it, letting the agent fall back
to running flow-building tools at every scope and gating platform write/delete
tools instead of treating the user as read-only.

/**
* Retrieve the curated tool catalog for the Expert's human-in-the-loop permissions UI
* (#421). Returns the merged catalog for both sections the UI shows:

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

keep these concise


/**
* Maps a platform automation tool's wire definition into a catalog entry for the
* Expert permissions UI (#421). Platform tools carry standard MCP annotations

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a little bit more concise

const platformHandler = app.comms?.platformAutomation
if (platformHandler) {
const platformDefs = platformHandler.getToolDefinitions() || []
catalog.push(...platformDefs.map(curatePlatformTool))

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we add titles on the mcp tools, do we need to map these?

},
// Size of the underlying buttons (passed through to ff-button). Defaults to
// 'medium' to match existing usages; 'small' suits dense contexts like tables.
size: {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the prop name is self explanatory, adding possible options should be enough

@@ -0,0 +1,170 @@
<template>
<div class="json-viewer">

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should scope this as a generic component otherwise we'll end up replicating it. Please move it under frontend/src/components. It's not as abstracted as i would have liked but we can do that later on

confirm-label="Done"
:can-be-canceled="false"
data-el="expert-settings-dialog"
boxClass="max-w-[54rem]!"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this needed?

<ff-data-table-row v-for="tool in group.tools" :key="tool.familyKey">
<ff-data-table-cell class="tool-col">
<div class="tool-permissions__cell">
<span class="tool-permissions__title">{{ tool.displayName }}</span>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will be hard to maintain going forward. Encapsulating smaller self-contained vue components add not only ease of maintenance but also local state that can be managed at the component level, allowing you to avoid the sausage types of methods that are within this component.

questionCadence: useProductExpertStore().questionCadence,
planMode: useProductExpertStore().planMode
planMode: useProductExpertStore().planMode,
// Signals that this FlowFuse version implements human-in-the-loop tool

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for such a long winded note, more concise


const MAX_DEBUG_LOG_ENTRIES = 100 // maximum number of debug log entries to keep

// --- Expert tool permissions (human-in-the-loop, #421) -----------------------

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these feel like they belong in a helper

// Expert tool permissions (HITL, #421). The catalog + hash are refreshed from
// the agent; defaults + preferences are the user's choices (persisted below).
toolCatalog: [],
toolCatalogHash: null,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these are expert concerns not assistant, or am i wrong?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants